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ABSTRACT 

A multimodal system with Poisson, Gaussian, and multi¬ 
nomial observations is considered. A generative graphical 
model that combines multiple modalities through common 
factor loadings is proposed. In this model, latent factors are 
like summary objects that has latent factor scores in each 
modality, and the observed objects are represented in terms 
of such summary objects. This potentially brings about a 
significant dimensionality reduction. It also naturally enables 
a powerful means of clustering based on a diverse set of 
observations. An expectation-maximization (EM) algorithm 
to find the model parameters is provided. The algorithm is 
tested on a Twitter dataset which consists of the counts and 
geographical coordinates of hashtag occurrences, together 
with the bag of words for each hashtag. The resultant factors 
successfully localizes the hashtags in all dimensions: counts, 
coordinates, topics. The algorithm is also extended to accom¬ 
modate von Mises-Fisher distribution, which is used to model 
the spherical coordinates. 

Index Terms — multimodal data fusion, unsupervised 
learning, graphical models, Twitter 

1. INTRODUCTION 

In complex systems a variety of observation modes (e.g., 
sensor readings, images, text) may be available to the deci¬ 
sion maker. For instance, the emerging technologies, such 
as cyber-physical systems, internet of things, autonomous 
driving, and smart grid, can provide such a rich observation 
space, both in volume and modality. In monitoring a complex 
system, all modalities bear information about the system’s 
internal state. In some cases, an anomaly may not be detected 
in each modality alone, but can be easily detected through a 
joint processing. 

Unsupervised learning methods are instrumental in dis¬ 
covering hidden structures in data (e.g., factor analysis [1], 
topic modeling [2]), which can then be used to perform di¬ 
mensionality reduction, anomaly detection, and clustering. 
Conventionally, they deal with unimodal data [1,2]. Efficient 
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fusion of multimodal data brings about information diversity 
and can greatly improve the statistical inference performance 
[3]. For example, data used for detection and estimation tasks 
are strongly coupled when both problems are solved jointly, 
considerably increasing the overall performance [4]. 

In [5], Gaussian and multinomial observations are jointly 
modeled using a mixture of factor analyzers. Although that 
work is conceptually similar to our work, its probabilistic 
model is quite different. In [5], the latent factor scores are 
shared in the system, which also synchronizes the modalities. 
Here we do not impose a synchronized observation model. 
We let the different modalities run in their own continuous¬ 
valued state, and only link them through the factor loadings 
of objects. On the contrary, in [5], modalities differ in their 
factor loadings, which can take a discrete set of values. 

2. PROBLEM STATEMENT 

We consider P objects (e.g., documents) each of which is ob¬ 
served through three disparate information sources. In this 
paper, we assume Poisson, Gaussian, and multinomial sta¬ 
tistical models for the information sources because of their 
wide application areas. Specifically, Poisson distribution is 
used to model event counts; Gaussian distribution is used for 
continuous-valued observations; and multinomial distribution 
models categorical observations. (Real-world examples can 
be seen in the Experiments section.) 

The graphical model in Fig. 1 is assumed to generate the 
multimodal observations , and . For 

each object i, the latent factor scores a;„, Wm, and Vd for K 
factors are mixed in the the natural parameters of the distri¬ 
butions through the unknown factor loadings q. Gaussian 
priors are assumed for the latent factor scores. In particular, 
each Poisson observation is conditionally distributed as 

t/mllCn ^ Pois(e^^“’"), (1) 

Xn ^ H{C, R) 

for i = 1,..., P, n = 1,... ,N. Similarly, for each Gaussian 
observation Zm we have 

^ Af{clw^,af), (2) 

Wm ^J\f{a,S) 






Fig. 1. Generative graphical model. Plate representation is 
used to show repeated structures. Circles and rectangles rep¬ 
resent random and deterministic variables, respectively. Ob¬ 
served variables are shaded. Each object i has three disparate 
observation streams; Poisson {ym}n, Gaussian {zim}m, and 
multinomial {hid}d- Factor loadings Ci, and the latent fac¬ 
tor scores a:„, Wm, and Vd constitute the multimodal factor 
model. 

for m = 1,..., M. We use different observation indices for 
different information sources since they do not need to be syn¬ 
chronized and the total number of observations may vary. For 
each multinomial observation hid we have 

( ”1 

^ Mult I Fj, n 5 

V Ell 

Vd^Af{/3,Q), 

where Li is the total number of instances and hid is the num¬ 
ber of instances observed under category d. 

The above probabilistic models are similar to generalized 
linear models since mixing occurs in the natural parameters. 
However, here the regressors a;„, w-m, and Vd are unknown, 
as well as the regression coefficients q, as opposed to gener¬ 
alized linear models. 

3. EXPECTATION-MAXIMIZATION ALGORITHM 

In this section, we derive the expectation-maximization (EM) 
algorithm to find the parameters 

9 = {ci,C,R,<jf,a,S,f3,Q}. 

3.1. Poisson E-step 

We are interested in computing the expectation of the complete- 
data log-likelihood E [log P({y„, a:„}|0)] over the posterior 
distribution P{{xn]\{yn], 9) of the latent factor scores. Due 
the lack of conjugacy between the prior on x„ and the Poisson 
likelihood, the posterior distribution does not have a closed 



form expression. Therefore, as in [6], we approximate the 
posterior with a Gaussian whose mean and covariance are 
the mode of the posterior (i.e., MAP estimate of a:„) and the 
negative inverse Hessian of the log posterior at that mode. 

We start by writing 

log P(a;„Iy„,6») = log P(y„|at„, 6») -f logP(a;„|6') -f Ci 

p 

^ ^ e ^ ^ “t“ yinP-i Xyi 

i=l 

- ^xlR~^Xn + C^R~^Xn + Cl, 

where C\ and Ci are constants that do not depend on a;„, and 
we used the Poisson likelihood from (1) and the multivariate 
Gaussian pdf. Then, the gradient and the Hessian are given 
by 

logP(a:„|y„,0) = 

p 

-f ytr?j Ci - R~^{xn - C) 

i=l 

P 

'^lr,'^ogP{xn\yn,9) = (4) 

i=r 

Since log P(£c„|y„, 9) is strictly concave in a;„, we can find 
the unique mode using the gradient and the Hessian in 
Newton’s method, i.e., 

Rx^{x^ ®n) “ X^ y 

where and are the gradient and the Hessian at itera¬ 
tion t, computed using (4). 

We approximate the posterior as 
P(a;„|y„,6») « where Finally, 

the expected complete-data log-likelihood is given by 

E [logP({y„,x„}|6»)] 

N 

= X! E [log P(y„|at„,6») -flog P(at„|6<)] 

n—1 

N P ^ 

= +yincfx„ - -E[x^R~^x„] 

n—1 i—1 

+ eR-^E [x^] - ^pR~^C -l\R\ + C3 

N P 

n—1i—1 

- ^Tr {R~^ 

-lpR-\-^\R\+Cs, (5) 








where C3 is a constant and we used the fact that 


square we finally obtain the posterior mean and covariance as 


E [x'^R '^Xn] = E \jr{x^R = E [Tr(fi '^x^x'^)] 

= Tr {R-^E [xux^]) 

( 6 ) 

3.2. Poisson M-step 

In iteration f + 1, we find the parameters and that 
maximize (5). Particularly, 


^ N 

C‘+1 = argmaxC^il-i ^ 

n—1 

Equating the derivative to zero we find that the mean of the 
factor scores is given by the average of the posterior means, 
i.e., 

n—1 

Similarly, 

i^‘+l = argmaxf] -ijr (|‘)^)) 

n=l 

N 

+ C^R-^ f - yliil. 

n=l 

Taking the derivative and equating to zero we get 


= B{C^-S~^z^+S~^a), B = 

(9) 

respectively. 

Then, the expected complete-data log-likelihood is writ¬ 
ten as 

E [\ogP{{Zm,Wm}\0)] 

M 

= ^ E[logP(2;^|t(;m,6')+logP(tnm|6')] 

m—1 

= -|| EEfA^ [{Zim-cfwmf] -f log(27rcr,2)") 

+ E [{wm - a)'^S~^{wm - a)] + log((27r)*^|S'|)|. 

( 10 ) 


3.4. Gaussian M-step 

At each iteration f -f 1, we find the parameters 5'*+^, 

and that maximize (10). Specifically, 

1 “ 

a*+i = argmax-- ^ E [{w^ - a)'^S~^{w^ - a)] 

m—1 

^ A/f 

T f T/-Y —1 

= argmaxa b > ——a b a, 

cx. 2 

m—1 


N 






n—1 


1 


(^n+e(e)'^')-c‘+'e^ 




(H‘+i)-i _ ^(i?‘+i)-i = 0 


N 


= ]V E (C‘+')^- (8) 


n—1 


3.3. Gaussian E-step 

We want to compute E [log P({zm, 'Wm}\d)] over the poste¬ 
rior distribution Pd^mlKtUm}, 0) of the latent factor scores. 
Since the Gaussian prior on the factor scores is conjugate to 
the Gaussian likelihood, the posterior is also Gaussian. To 
find its mean and covariance, from (2), we write 


P{z,n,Wjn\0) = P{Zm\Wm,0)P{Wjn\0) 

(27r)^|S|5|S'|5 


where C = [ci... cp]^ and S = diag((7d • ■ •, (Jp)- Collect¬ 
ing the terms that depend on Zm together and completing the 


where a^, given in (9), is the posterior mean E[n;m] at itera¬ 
tion t. Taking the derivative we get 

.. M 

m—1 

Similarly, 

= argmax 
S 

“ ^ E ^ - a)] - Y log \S\ 

m—1 

1 “ 

= argmax--^ Tr (S' ^{B + a^a^)) 

m—1 

+ E ~ Y 

m—1 

where we used (6) to write E [n;^S”^tUm] • Similar to (8), 
taking the derivative we find 

M 

^ M E + “L(aL)’^) - (12) 




We next find 


From (14) and (15), we get 


= argmax 

/t2 


where, from (9), cfwm\zm ~ cf Bci), hence 

E [{Zim - cfwm)"^] = cfBci + {cf Equat¬ 

ing the derivative to zero and solving for af we get 

- cja^y + cf Bci. (13) 

m—1 

3.5. Multinomial E-step 


M 


'2(7? 


E [{zim - C^Wruf 


M , 2 

logo., 


^ogP{h,\{ua},9) > hjrji- 





Exponentiating we obtain the following lower bound for the 
likelihood 


In the multinomial case, for identifiability, we use the last cat¬ 
egory as pivot and write the likelihood in terms of the alterna¬ 
tive factor scores Ud = Vd — vd, d = 1,..., D, 


D 


P(/r.|{M4,0) = n I Y 


:=i 
D 

=n^ 

d^l 


E d —1 

1=1 y 


lrjid-ise(r)i)]hid 


(14) 


where ijid = cfud and T]i = [rin ... piD-i]- The normal¬ 
izing term in the probability expression, also called the log- 
sum-exp (Ise) function prevents a closed form solution for 
the posterior. Einding a quadratic upper bound for it we can 
bound from below the likelihood, and in turn the expected 
complete-data log-likelihood, which we want to maximize. 

Using the Taylor series of Ise(rji) we can find such a 
quadratic bound [7] as follows. 


P{hi\{ud},0) >M{hi\r]i,A ^)/j(7g), 

where -f = A~'^ - Pii'jiij +7i is 

the Gaussian pseudo-observation 

Since the factor scores {ud} are correlated given the ob¬ 
servations, we seek the posterior of the combined vector 
u = [uii... ud -1 ~ A/’(0, Q) where 0 is the zero vec¬ 
tor and Q = Id-i ® Q is a block-diagonal matrix. Similarly, 
defining Ci = Id-i ® G we can write pi = Cf u. Then, for 
the complete-data likelihood we have 


P({hi}M0) > 


A-1)/,(7,) M{u\Q,Q). (16) 


lse(?7i) = lse(7i) -f (rji - 7i)^Vlse(7i) 

+ \yn^ - 7 g)^V^lse( 7 , -f e(rji - 7 ^)) 

< + c-ii- (15) 

To show the above inequality note that Vlse( 7 i) is the prob¬ 
ability vector Pi{ji), and V^lse = Ap. — Pipf where 
Ap^ = diag(pii,... ,piD-i). In [7], the latter is shown to be 
bounded as follows 

V^lse < gl = i - 14^) . 


Organizing the terms and completing the square we write 


u\e) > Af{u\ct>, $) gi{{hi, 7i}), 


where 


= ^=11 C,ACy + Q 


(17) 


i=l 


\i=l 


are the posterior mean and covariance. 


where Id-i and 1d-i are the identity matrix and the vector 
of ones of size D — 1 x D — \ and 77 — 1 x 1, respectively. 

Substituting A and Vlse( 7 i) and organizing the terms gives 3.6. Multinomial M-step 
us the inequality in (15), where 

Y We maximize the expected complete-data log-likelihood of 

b-fi = — Pi(7i)) = lse(7i) T ~ Piili)- the lower bound given in (16) using the posterior mean and 



covariance, given in (17), as in [5]. 


= argmax E [log P({/ri}, tt|0)] 
Q 


= argnmx —-E 


D - 1 


logIQI 




'J2c.ac^ 




“ 9 XI + X + ^4 


( 18 ) 


(3*4-1 = argrnax — iTr ij- 


$ 


= arg max 

Q 


log\Q\ 


D - 1 

D-l 


Ml)) 


d=l 


D-l 


log|(3| (19) 


where (74 is a constant, (pd is the dth vector of size K in 0, 
and is the dth matrix of size K x K on the diagonal of 
The last equality follows from the fact that Q is block 


diagonal. We used ( 6 ) for E 
( 12 ) we find 


uQ 


Similar to ( 8 ) and 


X + <Pdi4>d)l ■ (20) 

4 d=i 

Since (15) holds with equality for -ji = rji, and the curva¬ 
ture does not depend on rji, it can be shown that the optimal 
value for 7 ^ is Cf cp [5]. Note that using the factor scores 
Ud ~ A/^(0, Q) the mean vector (3 in (3) is not needed. 


3.7. Factor Loadings 

Finally, combining (5), (10) and (18) we compute as 
follows 


Cj — ^rg mux E [{, a?n; ? tUm 1 ' 

Ci 

N 


= arg max 


i:(- 


Uin^i 


M 1 j, 

— ~ ^rn — Zim) 


20 "* 1 

* m—1 

'-^T / X. I 


ijr (C,ACJ ($ + cpcp^)) + cP^C^Ah,. (21) 


Note that cfa;„|{y„} ~ A/'(cf^„,cf'I'„Ci). Completing the 
square in the Gaussian integral it is straightforward to show 


= e'=<«"+- 


Moreover, we can write 


Jr(CiACl{^ + cpcp'^)^ = cfC/Q, and (p'^CiAhi = 
cfdi, where U = - V, V = [^pl■■■^pK], 

V’fe = 2 ^ and F = [cpi... (po-i]- 

We can use Newton’s method to find through the 
gradient and the Hessian, i.e.. 


N 


N 


Vc.=-i:' 


Di «n-r- 


('CiT “1“ ^ ^ Vin^n 


M ^ 1 ^ 

2 / ^ \^i Zimj^n 

m—1 

D-l 

~ ^ ^ Cid{^d + 4^d<Pd )^i FAhi 


N 


v^. = -X' 


d^l 


,cfen + - 




M 


D-l 


-2 X “ X ^d{^d + (Pd(pl)- (22) 


1 

m—1 


d^l 


4 . EXPERIMENTS 

We test our algorithm on a Twitter dataset that is filtered from 
the 10% of the tweets in January 2013. In our dataset, we 
consider 2444 hashtags as objects (i.e., P = 2444); and an¬ 
alyze their number of occurrences in 743 hours (i.e., N = 
743), available geographical coordinates, and word counts. 
We model the hashtag occurrences (i.e., number of tweets that 
mention a hashtag) using Poisson distribution. Word counts 
are modeled using multinomial distribution with a dictionary 
size of 2645 after eliminating the words that appear less than 
100 times in the whole dataset. 

The number of available coordinates ranges between 10 
and 10456 with a mean of 134. Since geographical coordi¬ 
nates (latitude and longitude) constitute spherical data, they 
are better modeled using von Mises-Fisher (vMF) distribution 
than Gaussian, which is treated initially due to its popularity. 
Thus, we here present an extension of our algorithm for vMF 
distribution. The observation model for vMF distribution is 
given below 

zL\Wm^yMF{cJW^,Ki), 

'^lik ~vMF(afc,Sfc), 

for m = 1 ,..., Mi, where is the spherical coordinate 
vector obtained from the original latitude and longitude infor¬ 
mation; 

= [wmi ■ ■ ■ WmK]'^ is the latent factor scores for all 
three dimensions. The scores for each factor k are also 
vMF distributed. 



The vMF likelihood is given by 


January 5th and 18th. Astrology hashtags have regularly high 
counts. The last factor focuses on the East Coast sports teams 
in football, basketball, and baseball. 


where C{Ki) = concentration parame¬ 

ter, and cfWm is the mean direction. Combining the likeli¬ 
hood with the prior it is straightforward to show that the poste¬ 
rior P {wjnk I Zim ,0) is also vMF with mean and concentration 
given by 


^mk — 


’^k^k “t” kviCikZijji 
^mk 


i*mk — T kliCikZim.\\-i 


respectively. Maximizing the expected complete-data log- 
likelihood E[log P{{zim, Wjnk}\ 0 )] we get 

M ^ ^ ^ max(Mi). 


Using the well-known approximation we estimate the concen¬ 
tration parameters as 

.t+i 3f,-r-3 

- 1-fy M 

3R,-Rf - 

l-i ?2 ’ M, 


5. CONCLUSION 

A generative graphical model and an EM algorithm to ana¬ 
lyze it have been proposed to be able summarize multimodal 
objects that consist of disparate observations. In a one-month 
Twitter dataset, the discovered factors, each of which acts as 
a summary multimodal object, have been shown to success¬ 
fully localize hashtags in terms of popularity, geography, and 
topic. 
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Table 1. Factors localized in terms of popularity, geography, 
and topic. Numbers in parentheses denote the average popu¬ 
larity scores. 


Europe Soccer 

US New Year 

Middle East Countries 

Astrology 

East Coast Sports 

#CFC (83) 

#HappyNewYear (49) 

#bahrain (133) 

#Aries (210) 

#Patriots (29) 

#FACup (40) 

#NewYear (13) 

#Pakistan (30) 

#Pisces (231) 

#Knicks(18) 

#Arsenal (33) 

#newyearseve (3) 

#India(ll) 

#Capricorn (210) 

#GoHawks (20) 

#ballondor (17) 

#Best2012Memories (4) 

#kuwait (5) 

#Virgo (192) 

#Steelers (2) 

#realmadrid (14) 

#Feliz2013 (1) 

#Iraq (2) 

#Sagittarius (216) 

#Yankees (3) 



































